81 research outputs found
Generalization Through the Lens of Learning Dynamics
A machine learning (ML) system must learn not only to match the output of a
target function on a training set, but also to generalize to novel situations
in order to yield accurate predictions at deployment. In most practical
applications, the user cannot exhaustively enumerate every possible input to
the model; strong generalization performance is therefore crucial to the
development of ML systems which are performant and reliable enough to be
deployed in the real world. While generalization is well-understood
theoretically in a number of hypothesis classes, the impressive generalization
performance of deep neural networks has stymied theoreticians. In deep
reinforcement learning (RL), our understanding of generalization is further
complicated by the conflict between generalization and stability in widely-used
RL algorithms. This thesis will provide insight into generalization by studying
the learning dynamics of deep neural networks in both supervised and
reinforcement learning tasks.Comment: PhD Thesi
Generalization through the lens of learning dynamics
A machine learning (ML) system must learn not only to match the output of a target function on a training set, but also to generalize to novel situations in order to yield accurate predictions at deployment. In most practical applications, the user cannot exhaustively enumerate every possible input to the model; strong generalization performance is therefore crucial to the development of ML systems which are performant and reliable enough to be deployed in the real world. While generalization is well-understood theoretically in a number of hypothesis classes, the impressive generalization performance of deep neural networks has stymied theoreticians. In deep reinforcement learning (RL), our understanding of generalization is further complicated by the conflict between generalization and stability in widely-used RL algorithms. This thesis will provide insight into generalization by studying the learning dynamics of deep neural networks in both supervised and reinforcement learning tasks.
We begin with a study of generalization in supervised learning. We propose new PAC-Bayes generalization bounds for invariant models and for models trained with data augmentation. We go on to consider more general forms of inductive bias, connecting a notion of training speed with Bayesian model selection. This connection yields a family of marginal likelihood estimators which require only sampled losses from an iterative gradient descent trajectory, and analogous performance estimators for neural networks. We then turn our attention to reinforcement learning, laying out the learning dynamics framework for the RL setting which will be leveraged throughout the remainder of the thesis. We identify a new phenomenon which we term capacity loss, whereby neural networks lose their ability to adapt to new target functions over the course of training in deep RL problems, for which we propose a novel regularization approach. Follow-up analysis studying more subtle forms of capacity loss reveals that deep RL agents are prone to memorization due to the unstructured form of early prediction targets, and highlights a solution in the form of distillation. We conclude by calling back to a different notion of invariance to that which started this thesis, presenting a novel representation learning method which promotes invariance to spurious factors of variation in the environment
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Since their introduction a year ago, distributional approaches to
reinforcement learning (distributional RL) have produced strong results
relative to the standard approach which models expected values (expected RL).
However, aside from convergence guarantees, there have been few theoretical
results investigating the reasons behind the improvements distributional RL
provides. In this paper we begin the investigation into this fundamental
question by analyzing the differences in the tabular, linear approximation, and
non-linear approximation settings. We prove that in many realizations of the
tabular and linear approximation settings, distributional RL behaves exactly
the same as expected RL. In cases where the two methods behave differently,
distributional RL can in fact hurt performance when it does not induce
identical behaviour. We then continue with an empirical analysis comparing
distributional and expected RL methods in control settings with non-linear
approximators to tease apart where the improvements from distributional RL
methods are coming from.Comment: To appear in the Proceedings of the Thirty-Third AAAI Conference on
Artificial Intelligenc
Managing Networks for School Improvement: Seven Lessons from the Field
In recent decades, new networks for school improvement (NSI) have proliferated across the country. These emerging organizational structures present education leaders with an opportunity to build dynamic infrastructures to engage schools in improvements to teaching and learning. NSI are diverse. Some NSI are part of school districts, while others are contracted by school districts to design blueprints for school improvement. What all NSI have in common is a central hub supporting a set of member schools, like the center of a wheel and its spokes.
In this guidebook, we focus on common lessons for designing improvement infrastructures from the perspective of leaders across four different types of networks, including: Local district superintendents who support schools in a particular geographic area; Field support centers, which partner with district superintendents in the intermediary space between the central office and schools; Affinity organizations, which are independent non-profit organizations that work under contract from the central district office to support a select group of district schools; and Charter school management organizations that operate outside the district, supporting their affiliated member schools.
Our aim was to better understand how NSI were responding to the increased demands of recent shifts to more rigorous college- and career-ready standards. These seven lessons emerged from interviews with central office administrators overseeing NSI and staff working in network hubs, as well as from observations of professional learning (PL) sessions provided by hubs. We hope these lessons are useful to your work improving teaching and learning in your school, network, or district
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation
We study the problem of temporal-difference-based policy evaluation in
reinforcement learning. In particular, we analyse the use of a distributional
reinforcement learning algorithm, quantile temporal-difference learning (QTD),
for this task. We reach the surprising conclusion that even if a practitioner
has no interest in the return distribution beyond the mean, QTD (which learns
predictions about the full distribution of returns) may offer performance
superior to approaches such as classical TD learning, which predict only the
mean return, even in the tabular setting.Comment: ICML 202
Understanding plasticity in neural networks
Plasticity, the ability of a neural network to quickly change its predictions
in response to new information, is essential for the adaptability and
robustness of deep reinforcement learning systems. Deep neural networks are
known to lose plasticity over the course of training even in relatively simple
learning problems, but the mechanisms driving this phenomenon are still poorly
understood. This paper conducts a systematic empirical analysis into plasticity
loss, with the goal of understanding the phenomenon mechanistically in order to
guide the future development of targeted solutions. We find that loss of
plasticity is deeply connected to changes in the curvature of the loss
landscape, but that it often occurs in the absence of saturated units. Based on
this insight, we identify a number of parameterization and optimization design
choices which enable networks to better preserve plasticity over the course of
training. We validate the utility of these findings on larger-scale RL
benchmarks in the Arcade Learning Environment.Comment: Accepted to ICML 2023 (oral presentation
- …